PhD+ 2024: Data Literacy in R
Session 3
Developed by Hadley Wickham in 2005.
Implements the graphics scheme described in the book The Grammar of Graphics by Leland Wilkinson.
Uses a standardized system of syntax that makes it easy(-ish) to learn.
It takes care of a lot fiddly details such as colors, scales, and legend placement.
It does not do 3D or interactive graphics.
The Grammar of Graphics boiled down to 5 bullets, courtesy of Wickham (2016, p. 4):
a statistical graphic is a mapping from data to aesthetic attributes (location, color, shape, size) of geometric objects (points, lines, bars).
the geometric objects are drawn in a specific coordinate system.
scales control the mapping from data to aesthetics and provide tools to read the plot (ie, axes and legends).
the plot may also contain statistical transformations of the data (means, medians, bins of data, trend lines).
faceting can be used to generate the same plot for different subsets of the data.
Specify data, aesthetics and geometric shapes
ggplot(data, aes(x=, y=, color=, shape=, size=)) +
geom_point(), or geom_histogram(), or geom_boxplot(), etc.
This combination is very effective for exploratory graphs.
The data must be a data frame.
The aes() function maps columns of the data frame to aesthetic properties of geometric shapes to be plotted.
ggplot() defines the plot; the geoms show the data; each component is added with +
Some examples should make this clear
We’ll demonstrate ggplot2 using the Albemarle County real estate data, which was downloaded from Office of Geographic Data Services.
Some variables of interest:
Note: the following examples use a sample of the homes data.
ggplot + geomsA natural next step in exploratory graphing is to create plots of subsets of data. These are called facets in ggplot2.
Use facet_wrap() if you want to facet by one variable and have ggplot2 control the layout. Example:
+ facet_wrap( ~ var)Use facet_grid() if you want to facet by one and/or two variables and control layout yourself.
Examples:
+ facet_grid(. ~ var1) - facets in columns
+ facet_grid(var1 ~ .) - facets in rows
+ facet_grid(var1 ~ var2) - facets in rows and columns
facet_wrapfacet_grid (histograms)coord_cartesian allows us to zoom in on a plot, as if using magnifying glasscoord_fixed allows us to control “aspect ratio”coord_flip allows us to flip the x and y axisScales control the mapping from data to aesthetics and provide tools to read the plot (ie, axes and legends).
Every aesthetic has a default scale. To modify a scale, use a scale function.
All scale functions have a common naming scheme:
scale _ name of aesthetic _ name of scale
Examples: scale_y_continuous, scale_color_discrete, scale_fill_manual
Heads up: The documentation for ggplot2 scale functions will frequently use functions from the scales package (also by Wickham)!
The default ggplot2 theme is excellent. It follows the advice of several landmark papers regarding statistics and visual perception. (Wickham 2016, p. 176)
However you can change the theme using ggplot2’s themeing system. To date, there are seven built-in themes: theme_gray (default), theme_bw, theme_linedraw, theme_light, theme_dark, theme_minimal, theme_classic
You can also update axis labels and titles using the labs function.
Using the ggplotly() function from the {plotly} package, we can make (some) plots interactive.
ggplot(data, aes()) + geom!ggplot2 documentation has many good examplesChang, W. (2013), R Graphics Cookbook, O’Reilly. https://r-graphics.org/
Wickham, H. (2016), ggplot2: Elegant Graphics for Data Analysis (2nd ed), Springer. https://ggplot2-book.org/
Wickham, H. and Grolemund G. (2017), R for Data Science. O’Reilly. https://r4ds.hadley.nz/
More on plotly
https://plotly-r.com
clayford@virginia.edu | GitHub @clayford